AITopics | Caldas Department

Collaborating Authors

Caldas Department

Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings

Hong, Harbin, Caldas, Sebastian, Leqi, Liu

arXiv.org Artificial IntelligenceJun-19-2025

As Large Language Models (LLMs) increasingly appear in social science research (e.g., economics and marketing), it becomes crucial to assess how well these models replicate human behavior. In this work, using hypothesis testing, we present a quantitative framework to assess the misalignment between LLM-simulated and actual human behaviors in multiple-choice survey settings. This framework allows us to determine in a principled way whether a specific language model can effectively simulate human opinions, decision-making, and general behaviors represented through multiple-choice options. We applied this framework to a popular language model for simulating people's opinions in various public surveys and found that this model is ill-suited for simulating the tested sub-populations (e.g., across different races, ages, and incomes) for contentious questions. This raises questions about the alignment of this language model with the tested populations, highlighting the need for new practices in using LLMs for social science studies beyond naive simulations of human subjects.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.14997

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.05)
South America > Colombia > Caldas Department (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.56)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Proof-Carrying Neuro-Symbolic Code

Komendantskaya, Ekaterina

arXiv.org Artificial IntelligenceApr-17-2025

This invited paper introduces the concept of "proof-carrying neuro-symbolic code" and explains its meaning and value, from both the "neural" and the "symbolic" perspectives. The talk outlines the first successes and challenges that this new area of research faces. Keywords: Neural Networks Cyber-Physical System Verification Programming Languages Neuro-Symbolic Programs. 1 Neuro-Symbolic Proofs and Programs Proof Carrying Code is a long tradition within programming language research, broadly referring to methods that interleave verification with executable code, thus avoiding the inevitable discrepancies that arise when the code and the proofs are handled in different languages. Although the term was coined by Necula [50] almost three decades ago, with time, it grew to encompass any languages that are powerful enough to handle both the coding and the proving. Examples are dependently-typed (Agda, Idris, Coq/Rocq) and refinement-typed (F*, Liquid Haskell) languages.

artificial intelligence, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.12031

Country:

South America > Colombia > Caldas Department > Manizales (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Conjecturing from Scratch

Gauthier, Thibault, Urban, Josef

arXiv.org Artificial IntelligenceMar-3-2025

We develop a self-learning approach for conjecturing of induction predicates on a dataset of 16197 problems derived from the OEIS. These problems are hard for today's SMT and ATP systems because they require a combination of inductive and arithmetical reasoning. Starting from scratch, our approach consists of a feedback loop that iterates between (i) training a neural translator to learn the correspondence between the problems solved so far and the induction predicates useful for them, (ii) using the trained neural system to generate many new induction predicates for the problems, (iii) fast runs of the z3 prover attempting to prove the problems using the generated predicates, (iv) using heuristics such as predicate size and solution speed on the proved problems to choose the best predicates for the next iteration of training. The algorithm discovers on its own many interesting induction predicates, ultimately solving 5565 problems, compared to 2265 problems solved by CVC5, Vampire or Z3 in 60 seconds.

induction, ite, predicate, (16 more...)

arXiv.org Artificial Intelligence

2503.01389

Country:

Europe > Austria (0.04)
South America > Colombia > Caldas Department > Manizales (0.04)
North America > United States > District of Columbia > Washington (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Neural Network Verification is a Programming Language Challenge

Cordeiro, Lucas C., Daggitt, Matthew L., Girard-Satabin, Julien, Isac, Omri, Johnson, Taylor T., Katz, Guy, Komendantskaya, Ekaterina, Lemesle, Augustin, Manino, Edoardo, Šinkarovs, Artjoms, Wu, Haoze

arXiv.org Artificial IntelligenceJan-10-2025

Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.

artificial intelligence, machine learning, programming language, (15 more...)

arXiv.org Artificial Intelligence

2501.05867

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
(12 more...)

Genre:

Overview (0.46)
Research Report (0.41)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NLP Verification: Towards a General Methodology for Certifying Robustness

Casadio, Marco, Dinkar, Tanvi, Komendantskaya, Ekaterina, Arnaboldi, Luca, Daggitt, Matthew L., Isac, Omri, Katz, Guy, Rieser, Verena, Lemon, Oliver

arXiv.org Artificial IntelligenceMay-31-2024

Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literature, they are often light on the pragmatical issues of NLP verification. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we give a general (i.e. algorithm-independent) characterisation of verifiable subspaces that result from embedding sentences into continuous spaces. We identify, and give an effective method to deal with, the technical challenge of semantic generalisability of verified subspaces; and propose it as a standard metric in the NLP verification pipelines (alongside with the standard metrics of model accuracy and model verifiability). Secondly, we propose a general methodology to analyse the effect of the embedding gap -- a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. In extreme cases, poor choices in embedding of sentences may invalidate verification results. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap; and in particular we propose the metric of falsifiability of semantic subspaces as another fundamental metric to be reported as part of the NLP verification pipeline. We believe that together these general principles pave the way towards a more consolidated and effective development of this new domain.

dataset, perturbation, subspace, (15 more...)

arXiv.org Artificial Intelligence

2403.10144

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
South America > Colombia > Caldas Department > Manizales (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
(16 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs

Antoniak, Maria, Naik, Aakanksha, Alvarado, Carla S., Wang, Lucy Lu, Chen, Irene Y.

arXiv.org Artificial IntelligenceJan-23-2024

Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.11803

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > South Korea > Seoul > Seoul (0.04)
(23 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Public Health > Maternal Health (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Counterfactuals Modulo Temporal Logics

Finkbeiner, Bernd, Siber, Julian

arXiv.org Artificial IntelligenceJun-15-2023

Evaluating counterfactual statements is a fundamental problem for many approaches to causal reasoning [40]. Such reasoning can for instance be used to explain erroneous system behavior with a counterfactual statement such as'If the input i at the first position of the observed computation π had not been enabled then the system would not have reached an error e.' which can be formalized using the counterfactual operator and the temporal operator F: π ( i) ( Fe). Since the foundational work by Lewis[38] on the formal semantics of counterfactual conditionals, many applications for counterfactuals [28, 5, 34, 46, 3, 15] and some theoretical results on the decidability of the original theory [37] and related notions [20, 2] have been discovered. Still, certain domains have proven elusive for a long time, for instance, theories involving higher-order reasoning and an infinite number of variables. In this paper, we consider a domain that combines both of these aspects: temporal reasoning over infinite sequences. In particular, we consider counterfactual conditionals that relate two properties expressed in temporal logics, such as the temporal property F e from the introductory example. Temporal logics are used ubiquitously as high-level specifications for verification [21, 4] and synthesis [22, 41], and recently have also found use in specifying reinforcement learning tasks [32, 39]. Our work lifts the language of counterfactual reasoning to similar high-level expressions.

artificial intelligence, logic & formal reasoning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.29007/qtw7

2306.08916

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(17 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How banks and fintech are using artificial intelligence to deliver loans - The Goa Sportlight

#artificialintelligenceDec-31-2021, 03:16:04 GMT

Financial technology services are increasingly large and diverse, not only representing a change for users, but also for banks that have had to adapt as new developments allow greater knowledge of the market and customers. Faced with this situation, they have launched in Colombia a platform that will use advanced artificial intelligence functions to generate a credit score for each person and allow financial institutions to identify potential clients. The new system is developed by the fintech Yabx which specializes in enabling credit for unbanked sectors, so thanks to an alliance it will base its data on Telecom's Telecommunications system in association with Claro, therefore It will allow the identification of new clients not recognized by the criteria of traditional banking. The platform will use machine-learning algorithms (artificial intelligence machine learning) to provide a credit score and other products that can be offered to banks or other fintech companies that want to improve their abilities to acquire and qualify customers whose applications to banks traditional are rejected. Thanks to the association with Claro, one of the largest telecommunications networks in the country, the new system will be able to cover around 67% of Colombian adults, in addition, it will allow credit institutions to reduce their rejection rates by up to 40% by take into account factors that are not normally observed.

artificial intelligence, bank and fintech, goa sportlight, (14 more...)

#artificialintelligence

Country:

South America > Colombia > Santander Department > Bucaramanga (0.06)
South America > Colombia > Norte de Santander Department > Cúcuta (0.06)
South America > Colombia > Caldas Department > Manizales (0.06)
South America > Colombia > Bogotá D.C. > Bogotá (0.06)

Industry:

Banking & Finance (1.00)
Transportation > Passenger (0.59)
Transportation > Ground > Road (0.59)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dive into Deep Learning -- Dive into Deep Learning 0.17.0 documentation

#artificialintelligenceJul-25-2021, 10:18:02 GMT

institute, niversity, university, (4 more...)

#artificialintelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.16)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.16)
Europe > Denmark > Capital Region > Copenhagen (0.16)
(67 more...)

Industry: Education > Educational Setting > Higher Education (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Scalable Prototype Selection by Genetic Algorithms and Hashing

Plasencia-Calaña, Yenisel, Orozco-Alzate, Mauricio, Méndez-Vázquez, Heydi, García-Reyes, Edel, Duin, Robert P. W.

arXiv.org Machine LearningDec-26-2017

Classification in the dissimilarity space has become a very active research area since it provides a possibility to learn from data given in the form of pairwise non-metric dissimilarities, which otherwise would be difficult to cope with. The selection of prototypes is a key step for the further creation of the space. However, despite previous efforts to find good prototypes, how to select the best representation set remains an open issue. In this paper we proposed scalable methods to select the set of prototypes out of very large datasets. The methods are based on genetic algorithms, dissimilarity-based hashing, and two different unsupervised and supervised scalable criteria. The unsupervised criterion is based on the Minimum Spanning Tree of the graph created by the prototypes as nodes and the dissimilarities as edges. The supervised criterion is based on counting matching labels of objects and their closest prototypes. The suitability of these type of algorithms is analyzed for the specific case of dissimilarity representations. The experimental results showed that the methods select good prototypes taking advantage of the large datasets, and they do so at low runtimes. Preprint submitted to Elsevier December 27, 2017 1. Introduction The vector space representation is a common option to represent the data for learning tasks since many statistical techniques are applicable for this kind of representation. However, there is an increasing number of real-world problems which are not vectorial. Instead, the data are given in terms of pairwise dissimilarities which may be non-Euclidean and even non-metric.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Machine Learning

1712.09277

Country:

Europe > Netherlands > South Holland > Delft (0.04)
South America > Colombia > Caldas Department > Manizales (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback